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In recent years, support vector regression (SVR) models have been widely 
applied in short-term electricity load forecasting. A critical challenge when 
applying the SVR model is to determine the model for optimal 
hyperparameters, which can be solved using several optimization methods as 
the grid search algorithm. Another challenge that affects the response time 


and the precision of the SVR model is the normalization process of input 


data. In this paper, the grid search algorithm will be suggested based on data 
Keywords: normalization methods including Z-score, min-max, max, decimal, 
sigmoidal, softmax, and then utilized to evaluate both the response time and 
precision. To verify the proposed methods, the actual electricity load 
demand data of two cities, including Queensland of Australia and Ho Chi 
Minh City of Vietnam, were utilized in this study. 
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1. INTRODUCTION 

Forecasting the electricity load plays a major role in an electricity system, which is composed of 
production planning, operational planning, and planning for future development plans [1]—[5]. There are a 
variety of solutions to predict the electricity load, including multiple regression, exponential smoothing, 
autoregressive integrated moving average (ARIMA), and artificial neural networks (ANNs), [6]—[12]. In the 
past decades, support vector regression (SVR) has emerged as a promising solution to electricity load 
forecasting [13]-[20]. Typically, the prediction precision of the SVR model relies on its hyperparameters, 
including € (error tolerance), C (penalty parameter), Kernel functions, and their Kernel parameters. 
Therefore, it is crucial to find optimal values of the SVR hyperparameters. Several optimization methods 
such as grid search, random search, and genetic algorithm, have been studied for this challenge, of which the 
Grid Search algorithm is widely applied in many works [21]-[35]. 

Another factor that affects the running time and the precision of the SVR algorithm is the 
characteristics of the input data. Data normalization, therefore, was adopted in many studies for SVR models 
[13], [16], [36], [37]. However, data normalization was not of their consideration with the use of the grid 
search algorithm [28]—[35]. This might lead to the missing of the best results of hyperparameter tuning in the 
grid search algorithm because the data had not been normalized. Besides, the running time of the model 
might extensively increase without the use of data normalization. 

Addressing these problems, this study suggests different data normalization techniques along with 
the grid search method for SVR hyperparameter tuning. At first, the input data are partitioned into two 
distinct sets, including training and testing datasets. The training dataset is then used for the training process, 
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in which the grid search method is employed to obtain different sets of optimal hyperparameters 
corresponding to different data normalization methods. On the other hand, the prediction errors of the 
optimal SVR models are evaluated in the testing process with the use of the testing dataset. Finally, the 
electric load data of Ho Chi Minh City, Vietnam, and Queensland, Australia are used to verify the results. 

This paper is structured: In sections 2 and 3, we present an introduction of the SVR model, SVR 
hyperparameters, the grid search algorithm, the mathematical models of data normalization techniques, and 
the SVR grid search algorithm based on data normalization. The experimental results and their evaluations 
are shown in section 4. Lastly, section 5 discusses the conclusions. 


2. RESEARCH METHOD 
2.1. SVR model 

Considering a sample dataset as given: {x;,y;},Vi = 1,..,N, with N the length of the samples, 
x; E R” the input vector, and y; E R the corresponding output vector. The crucial principle of SVR is the 
non-linear mapping of the input vector x into a feature space of higher dimensions by using a feature function 
y():R" > R”. The SVR function, which defines the correlation between the input and the target, is acquired 
using (1) [13], [16]-[18]: 


f(x) =o" d(x) +b (1) 


In (1), œ denotes the weight coefficient and b denotes the bias of the function. They can be determined by 
minimizing the regularized risk function R, given by (2): 


R =F loll? + CON Ler — FC), x) (2) 


In (2), the first component, ||w]||?, is known as the regulation term, the second component represents 
the empirical error between the actual and the predicted values, C is the penalty coefficient to regularize the 
relationship of these above-mentioned quantities, L, is the insensitive loss function that is defined by (3), and 
the error tolerance £ determines the constraints of f (x) as presented by Figure 1. 


Figure 1. Illustration of e, &i, €i* of the SVR model 
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ly — f(x)| — £, otherwise 
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The two slack variables é;, €;* are introduced to indicate how much deviation the data points can be 
from the margin £, so-called €-tube. From Figure 1, €;, € can be calculated as (4): 


ly-f(x)| — £ = č, over the tube 
ly-f(x)| — £ = &*, under the tube (4) 


By combining (4) with (3) and (2), the regularized risk function R can be re-written as (5) and follows the 
constraints (6): 


R =S|lol? + CLL + &) (5) 
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Yı- (p(x) +b) Sets; 
(Œ d(x) +b) -y SE+§; (6) 


The function f(x) is determined by applying the Lagrange function as given in (7), where q;, af represent 
the Lagrange multipliers and K(x;,x) denotes the Kernel function, that is defined as a dot (.) product of 


p(x)" and p(x): 
f(x) = wo h(x) +b = VEL (Qj — a) K (x, x) + b (7) 


Some conventional Kernel functions widely used in SVR can be mathematically expressed in the formulas 


(8)-(10): 


Linear: K(x, y) = xTy (8) 
RBF: K(x, y) = e77lle-yll? (9) 
Sigmoid: K(x, y) = tanh(y xy +r) (10) 


with x, y the inputs, r20 the intercept constant, and y>0 the main parameter of the Kernel function. 


2.2. SVR hyperparameters 

A machine learning model can be composed of two different types of parameters. The first one 
consists of model parameters learned during the model training, and the second one of hyperparameters 
which can be randomly set before starting training instead. Based on the SVR model in section 2.1, the 
hyperparameters that control the model performance include the following parameters [21], [30]-[32]: 

— The € parameter indicating the constraints of f (x); 

— The C parameter implying the relationship between the regulation term and the empirical error; 
— The Kernel function: linear, RBF, Sigmoid; 

— The Kernel y parameter. 

Therefore, it is critical to find optimal values of these hyperparameters to enhance the prediction 
performance of the SVR model. Several optimization techniques can be applied for this purpose such as grid 
search, random search, and genetic algorithms. Of these methods, the grid search algorithm is selected in this 
study because of its simplicity and effectiveness. 


2.3. Grid search method 

The grid search is a searching process through a grid of subsets that were pre-specified by the 
combinations of different values of the hyperparameters. Optimal hyperparameters are those corresponding 
to the model that produces the smallest error [28]-{35]. Figure 2 shows an example of the grid search 
algorithm with two hyperparameters ¢ and y. The e hyperparameter is configured with three values {e1, €2, 
£3}. Similarly, {y1, y2, y3} are the configuration values of the y hyperparameter. A combination of these two 
hyperparameters hence consists of 9 pairs. As a result, the grid search algorithm does the searching of the 
best model based on these 9 pairs. 
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Figure 2. Illustration of grid search algorithm 
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In general, the performance of SVR models can be estimated using a variety of metrics or error 
measures that evaluate the error between the actual and predicted values. Some popular error measures, to 
name a few, include mean absolute error (MAE) and root mean square error (RMSE). Their formulas are 
shown in (11), given as [29], [35]: 


1 = 1 M 
RMSE = de in1l¥i — Vil?, MAE = z i=1lyi — Îil (11) 


2.4. Cross-validation procedure 

Machine learning in general and SVR in particular can suffer from overfitting referring to a model 
that performs very well in the training process but poorly with new datasets. In this regard, one of the 
techniques called as k-fold cross-validation can be used to limit overfitting in the grid search algorithm [31]. 
This method allows a given dataset to be partitioned into k subsets (folds), of which (k-1) folds are used for 
training and the remaining fold is used for testing to validate the resulting model. As a result, the model is 
trained and tested in k times. The results of k times of cross-validation are then averaged to give an estimate 
of the model performance. Figure 3 shows an example of the k-fold cross-validation method with k=5. The 
given dataset was split into 5 folds. In the first cross-validation, the model took 4 folds from folds 2 to 5 for 
training, and the remaining fold (fold 1) was retained for validation. This process was then repeated from the 
second to the fifth cross-validations with the validation fold from fold 2 to fold 5. The averaged results of 5 
times of cross validation helped improve the reliability of the model performance. 
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Figure 3. Cross validation procedure 


2.5. Data normalization 

Several studies have shown that the prediction performance of SVR is strongly affected by the size 
and the fluctuation of the input data. As a result, data normalization is required for both training and testing 
processes of the model. Different data normalization techniques have been investigated for SVR models in 
previous works [36], [37], of which zero-mean, min-max, max, decimal, sigmoid, and softmax are selected in 
this study. The mathematical equations of these techniques are shown in Table 1, where Xmean» Xsta> Xmin> 
and Xmax are mean, standard deviation, min, and max values of x, respectively, and j is the smallest integer 
such that max(|x’| <) 1. 


Table 1. Equations of normalization techniques 


Oder Normalization Equations 
1 Zero-Mean yE X — Xmean 
Xsta 
2 Min-Max d= X — Xmin 
Xmax — Xmin 
3 Max ' x 
x= 
Xmax 
4 Decimal yr 
10/ 
3 Sigmoid 1 X — Xmi 
= x' =F = Va= m 
e Xstd 
6 Softmax p ier X — Xmin 
x= = ,va= 
1+e Xsta 
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3. GRID SEARCH ALGORITHM BASED ON DATA NORMALIZATION FOR SVR MODEL 

Based on section 2, the precision of the SVR model depends on its hyperparameters and the grid 
search algorithm that is combined with the cross-validation procedure provides an effective way to get these 
optimal hyperparameters. At the same time, data normalization also affects the response time and the 
performance of the SVR model. Therefore, the authors in this study suggest combining the grid search 
algorithm with different techniques of data normalization to evaluate the response time as well as the 
precision of the SVR model. The proposed method is shown in Figure 4. The algorithm was trained and 
tested following the steps: 

— Step 1: The original sample data were processed to provide two pairs of input-target, named as (Xtrain, 
Y rain) and (Xtest, Ytest) for training and testing datasets, respectively. 

— Step 2: The training and testing datasets were normalized using each normalization technique as 
previously mentioned in Section 2.5. 

— Step 3: The grid search method was applied to obtain the SVR optimal hyperparameters CFGop. 
Generally, CFG is a total combination of different sets of SVR hyperparameters. In particular, 
CFG=({cfgi}, i=1: N, with N the number of the combinations and cfgi={€i, Ci, kerneli, yi}. 

— The cross-validation technique was also implemented in this step to enhance the performance of the grid 
search algorithm. 

— Step 4: The SVR model with optimal hyperparameters was chosen to produce the predicted value Y’ predict. 

— Step 5: Y’ predict was the normalized value. Therefore, an inverse normalization process was required to 
obtain the original Y predict. 

— Step 6: Using (11), the prediction error of the SVR model was calculated based on the difference between 
Y predict and Y test- 

In the above procedure, step 3 was referred to as the training process, and steps 4, 5, and 6 implied the testing 

process. The whole process was applied with each data normalization technique that was mentioned in 

section 2.5. The corresponding results were recorded for evaluating the performance of these techniques. 
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Figure 4. The SVR grid search algorithm based on data normalization 


4. RESULTS AND DISCUSSION 
4.1. Data description 

To verify the reliability of the suggested algorithm, half-hourly load demand data of Queensland 
(Australia) and hourly load demand data of Ho Chi Minh City (Vietnam) were used as inputs in the 
experiments. In addition, each dataset was divided into datasets #1 and #2 with different time lengths and 
different statistical characteristics, as shown in Table 2. These datasets #1 and #2 are independent, and the 
experiments will be performed on datasets #1 and #2, respectively. 
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Table 2. The characteristics of datasets #1 and #2 


Datasets #1 Datasets #2 
eg Queensland Hochiminh city Queensland Hochiminh city 

Descripuon X Train X Test X Train X Test X Train X Test X Train X Test 
Time 26/04/14- 24/05/14- 25/11/18- 23/12/18- 29/03/14- 24/05/14- 28/10/18- 23/12/18- 
23/05/14 30/05/14 22/12/18 29/12/18 23/05/14 30/05/14 22/12/18 29/12/18 
Size (1344, 48) (336, 48) (672,24) (168, 24) (2688, 48) (336,48) (1344,24) (168, 24) 
Min 4304.46 4404.48 1347.7 1873.9 4279.21 4404.48 1347.70 1873.90 
Median 5532.44 5591.45 2917.94 2844.65 5589.60 5591.46 2952.69 2877.92 
Max 6982.23 6824.76 3945.9 3695.2 6984.78 6824.76 3945.90 3760.3 


Figure 5 shows the waveforms of Ytrain for dataset #1 in Table 2 with different data normalization 
methods, including none (unnormalized data), zero-mean, min-max, max, decimal, sigmoid, and softmax, 
from top to bottom respectively. In particular, Figure 5(a) illustrates the waveforms corresponding to 
Queensland data, while Figure 5(b) shows those with respect to the data of Ho Chi Minh City. For dataset #1, 
the measurement error used was RMSE, along with k-fold=4. For dataset #2, the measurement error MAE 
with k-fold=2 was applied. Thus, there will be 02 data sets (#1 and #2) corresponding to 02 values of k-fold 
(4, 2) as well as 02 measurement error values (RMSE, MAE). This allows the data to be processed under 
different circumstances, thereby improving the reliability of the experiment results. 
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Figure 5. Waveforms of Y wain of dataset #1 (a) Queensland and (b) Ho Chi Minh City 


4.2. Hyperparameter tuning 

As mentioned earlier, the SVR hyperparameters contain the tube size e, the regularized constant C, 
the Kernel functions L, the Kernel functions parameters y. Their tuning values are shown in Table 3 for both 
datasets #1 and #2. Combining all these values gives 176 cases corresponding to 176 possible SVR models. 


Table 3. Tuning hyerparameter values 


Items Values 
Tube size € le-4, le-3, le-2, le-1 
Regularized constant C 0.1, 1, 10, 100 
Kernel functions K linear, RBF, sigmoid 
Kernel function parameter y le-4, le-3, le-2, le-1, 1 
Number of combination CFG 176 


4.3. Experimental results 
Table 4 shows the running time in seconds of the training process for Queensland (QL) and Ho Chi 
Minh City (HCM) corresponding to datasets #1 and #2. The running time of the training process that is 
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shown in Table 4 and illustrated in Figure 6. Particularly, Figure 6(a) corresponds to dataset #1, and 
Figure 6(b) corresponds to dataset #2. 

Table 5 introduces the optimal SVR hyperparameters that were obtained by the grid search 
algorithm during the training process for QL and HCM in cases of datasets #1 and #2, respectively. It should 
be noted that the optimal Kernel function in most cases of normalization techniques for all datasets #1 and 
datasets #2 was ‘rbf’. Only in the case of none (unnormalized) data, it was ‘linear’. 


Table 4. The running time (seconds) of the training process 


Normalization Dataset #1 Dataset #2 
QL HCM QL HCM 
None 25,958 6,129 15,230 3,725 
Z-Score 1,771 214 1,272 164 
Min-max 405 58 396 68 
Max 232 56 253 S2 
Decimal 201 32 202 33 
Sigmoid 376 67 370 63 
Softmax 705 102 582 90 
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Figure 6. Running time of the training process (a) dataset #1 and (b) dataset #2 


Table 5. Optimal hyperparmeters of the training process 


Dataset #1 Dataset #2 Kernel 
Normalization € C y € y C function 
QL HCM QL HCM QL HCM QL HCM QL HCM QL HCM 

None le-3 le-2 le-1 lel le-4 le-4 le-1 le-1 linear 
Z-Score le-4 le-2 le2 le2 le-3 le-2 ted le-4 1e2 le2 le-3 le-2 rbf 
Min-max le-4 le-2 le2 le2 le-2 lel le-2 le-4 1e2 le2 le-2 lel rbf 
Max le-3 le-2 le2 le2 le-1 lel le-3 le-4 lel le2 le-1 lel rbf 
Decimal le-3 le-3 le2 lel le-1 1 le-4 le-3 lel le2 le-1 1 rbf 
Sigmoid le-2 le-2 lel le2 le-1 lel ted le-3 1le2 le2 le-2 le-1 rbf 
Softmax le-2 le-2 lel le2 le-1 lel le-4 le-3 1e2 le2 le-2 le-1 rbf 


Table 6 shows the error measures with respect to the optimal hyperparameters that were determined 
from the training process. These error measures are plotted in Figure 7. Specifically, Figure 7(a) shows the 
RMSE of the model in case of dataset #1, while Figure 7(b) shows the MAE in case of dataset #2. 

The testing performance of the optimal SVR models that were obtained from the training process is 
introduced in Table 7. Moreover, Figure 8 shows the testing performance of the optimal SVR models error 
measures regarding Table 7. Figure 8(a) shows the RMSE between the testing and predicated values, while 
Figure 8(b) shows the MAE between them. 
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Table 6. The error measures of the training process 


Normalization RMSE (MW), dataset #1 MAE (MW), dataset #2 
QL HCM QL HCM 

None 79.88 88.42 54.66 63.61 
Z-Score 47.19 36.44 34.42 21.52 
Min-max 46.06 36.79 34.95 22.42 
Max 39.23 45.43 34.73 27.12 
Decimal 43.72 47.33 37.49 22.73 
Sigmoid 38.35 47.19 36.21 25.53 
Softmax 39.24 23.00 31.30 18.04 


EE RMSE(training):QL EE MAE(training):QL 
mm RMSE(training):HCM EE MAE(training):HCM 


ow” ot” we ot oò Pa ot on” o® we wer oe oe we 
E E ae E ge at E E E a oe go 
Normalization techniques Normalization techniques 
(a) (b) 


Figure 7. The training performance with respect to (a) dataset #1 and (b) dataset #2 


Table 7. The error measures of the testing process 
RMSE (MW), dataset #1 MAE (MW), dataset #2 


Normalization QL HCM QL HCM 
None 76.51 81.75 53.43 60.32 
Z-Score 44.93 55.21 35.68 35.31 
Min-max 44.53 54.69 34.95 34.70 
Max 40.29 57.19 35.82 36.20 
Decimal 42.82 59.94 37.38 33.17 
Sigmoid 41.52 60.46 37.39 36.04 
Softmax 41.14 53.51 32.86 32.87 


4.4. Evaluation and discussion 

The first metric to be evaluated is the running time of the grid search algorithm corresponding to 
different normalization techniques used for SVR. Table 4 and Figure 6 showed that applying data 
normalization techniques significantly reduced the running time of the program. Interestingly, executing the 
popular Z-Score technique took a longer duration than other methods, while performing the Max and 
Decimal methods seemly produced the shortest duration. 

Analyzing Table 5, it was clearly observed that different data normalization techniques presented 
different optimal values of SVR hyperparameters, which had been achieved using the grid search algorithm 
in the training process. Besides, these optimal hyperparameters were different with datasets #1 and #2. 
Moreover, it was shown that the ’rbf’ function was the optimal Kernel function for all methods of data 
normalization, except for the none case where the model used ‘linear’ Kernel function with unnormalized 
data. Tables 6, 7 and Figures 7, 8 clearly showed that the training and testing errors of the models using data 
normalization were much smaller than those of the model using unnormalized data (none case). At the same 
time, the data normalization demonstrated an obvious influence on the grid search algorithm. Specifically, 
each data normalization has different precision scores in the training and testing processes. 
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It is worth to note that the softmax produced the least errors in most cases for training and testing 
processes. Indeed, let us analyze Table 6 and Figure 7 for the training process in case #1, Ho Chi Minh City 
data. For the softmax normalization, the value of RMSE is 23 MW. This value is much smaller than that of 
other normalization types, especially, in comparison with common ones of Z-scores (36.44 MW) and 
min-max (36.79 MW). Similar results were also received for Queensland data. These results clearly indicated 
that selecting a suitable data normalization as softmax normalization in this study can give better precision 
score in the training and testing processes of the grid search algorithm for SVR model. 
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Figure 8. The testing performance (a) dataset #1 and (b) dataset #2 


5. CONCLUSION 

The study has successfully utilized an effective approach to analyze the effects of a variety of data 
normalization techniques on the grid search algorithm for determining SVR optimal hyperparameters in the 
case of electricity load forecasting. The running time and the error measures (RSME and MAE) were 
evaluated during training and testing processes. Both the daily electric loads of Queensland, Australia, and 
Ho Chi Minh City, Vietnam, were used to verify the suggested model. The total dataset was split into two 
subsets of training and testing datasets to enhance the reliability of the study. The results showed that using 
data normalization helped greatly reduce the running time and obtain much smaller errors in terms of MAE 
and RMSE. The results also indicated that conventional data normalization techniques such as Z-Score and 
Min-Max did not guarantee the shortest running time and the smallest errors. This conclusion demonstrates 
the feasibility of applying different data normalization methods with the Grid Search algorithm problem in 
SVR models. 
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